Automatic Creation of Knowledge Graphs from Digital Musical Document Libraries

نویسندگان

  • Sergio Oramas
  • Mohamed Sordo
  • Xavier Serra
چکیده

Most of the current musicological knowledge is present in printed books and manuscripts. In the last years greats efforts have been done in order to digitize and make available these documents in form of Digital Libraries. However, digital documents are mainly stored as raw text, with no more structure than indexes and some metadata. Therefore, implicit knowledge contained in text is not understandable by computers and cannot be processed like that. Automatic processing of text documents may help musicologists in several ways, such as improving navigation through a library, discovering hidden knowledge, accelerating tedious tasks, etc. To apply these techniques to a Digital Library, the information contained in documents should be carefully structured and semantically annotated. Information Extraction is a discipline of computer science focused on the extraction of structured information from unstructured text sources. We propose a method to automatically extract meaningful knowledge from documents present in Digital Musical Document Libraries, by using Information Extraction techniques. Our method has two main steps. First, relevant named entities (e.g. composers, organizations, places, etc.) are identified in the text. Second, words between these entities are syntactically and semantically analyzed to understand the relationship between them. Finally, the extracted knowledge is represented in a machine-readable format as a knowledge graph, where entities are represented as nodes, and relations as edges. The resulting knowledge representation is finally visualized as an interactive graph. With the proposed information visualization, users may go from one document to another by browsing the knowledge graph. We tested our method with a subset of artist biographies present in the Grove Music Online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بررسی میزان رعایت معیارهای مدیریت دانش در وب‌سایت‏‌های کتابخانه‏‌های دیجیتالی منتخب در ایران

Background and Aim: Considering the elements of knowledge management (availability, creation, and transfer of knowledge) is very important in digital libraries websites and makes the performance better. So this paper aim to identify the knowledge management criteria in Iranian selected digital library's websites and study of observance scale Materials and Methods: The research method was des...

متن کامل

A Fuzzy Logic Based Expert System for Quality Assurance of Document Image Collections

Huge document image collections in digital libraries are prone to reduced quality and require automatic quality assurance. This paper presents an approach for bringing together information automatically aggregated from a quality assurance tool and expert knowledge related to digital preservation. The main contribution of this work is the definition of fuzzy expert rules and the application of f...

متن کامل

Expanding a Humanities Digital Library: Musical References in Cervantes' Works

Digital libraries focused on developing humanities resources for both scholarly and popular audiences face the challenge of bringing together digital resources built by scholars from different disciplines and subsequently integrating and presenting them. This challenge becomes more acute as libraries grow, both in terms of size and organizational complexity, making the traditional humanities pr...

متن کامل

Interfaces for Document Representation in Digital Music Libraries

Musical documents, that is, documents whose primary content is printed music, introduce interesting design challenges for presentation in an online environment. Considerations for the unique properties of printed msic, as well as users’ expected levels of comfort with these materials, present opportunities for developing a viewer specifically tailored to displaying musical documents. This paper...

متن کامل

Header Metadata Extraction from Semi-structured Documents Using Template Matching

With the recent proliferation of documents, automatic metadata extraction from document becomes an important task. In this paper, we propose a novel template matching based method for header metadata extraction form semi-structured documents stored in PDF. In our approach, templates are defined, and the document is considered as strings with format. Templates are used to guide finite state auto...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014